Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 127
Filtrar
1.
Biostatistics ; 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38649751

RESUMEN

CRISPR genome engineering and single-cell RNA sequencing have accelerated biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present considerable statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens-"thresholded regression"-exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV ("GLM-based errors-in-variables"), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across hundreds of processors on clouds (e.g. Microsoft Azure) and high-performance clusters. Leveraging this infrastructure, we apply GLM-EIV to analyze two recent, large-scale, single-cell CRISPR screen datasets, yielding several new insights.

2.
bioRxiv ; 2024 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-38659821

RESUMEN

Single-cell CRISPR screens (perturb-seq) link genetic perturbations to phenotypic changes in individual cells. The most fundamental task in perturb-seq analysis is to test for association between a perturbation and a count outcome, such as gene expression. We conduct the first-ever comprehensive benchmarking study of association testing methods for low multiplicity-of-infection (MOI) perturb-seq data, finding that existing methods produce excess false positives. We conduct an extensive empirical investigation of the data, identifying three core analysis challenges: sparsity, confounding, and model misspecification. Finally, we develop an association testing method - SCEPTRE low-MOI - that resolves these analysis challenges and demonstrates improved calibration and power.

3.
BMC Bioinformatics ; 25(1): 113, 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38486150

RESUMEN

BACKGROUND: Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. RESULTS: We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. CONCLUSIONS: eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Humanos , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Análisis de la Célula Individual/métodos
4.
Biometrics ; 80(1)2024 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-38465983

RESUMEN

In genomics studies, the investigation of gene relationships often brings important biological insights. Currently, the large heterogeneous datasets impose new challenges for statisticians because gene relationships are often local. They change from one sample point to another, may only exist in a subset of the sample, and can be nonlinear or even nonmonotone. Most previous dependence measures do not specifically target local dependence relationships, and the ones that do are computationally costly. In this paper, we explore a state-of-the-art network estimation technique that characterizes gene relationships at the single cell level, under the name of cell-specific gene networks. We first show that averaging the cell-specific gene relationship over a population gives a novel univariate dependence measure, the averaged Local Density Gap (aLDG), that accumulates local dependence and can detect any nonlinear, nonmonotone relationship. Together with a consistent nonparametric estimator, we establish its robustness on both the population and empirical levels. Then, we show that averaging the cell-specific gene relationship over mini-batches determined by some external structure information (eg, spatial or temporal factor) better highlights meaningful local structure change points. We explore the application of aLDG and its minibatch variant in many scenarios, including pairwise gene relationship estimation, bifurcating point detection in cell trajectory, and spatial transcriptomics structure visualization. Both simulations and real data analysis show that aLDG outperforms existing ones.


Asunto(s)
Algoritmos , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Análisis de Secuencia de ARN/métodos
5.
HGG Adv ; 5(2): 100280, 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38402414

RESUMEN

Polygenic scores (PGSs) are quantitative metrics for predicting phenotypic values, such as human height or disease status. Some PGS methods require only summary statistics of a relevant genome-wide association study (GWAS) for their score. One such method is Lassosum, which inherits the model selection advantages of Lasso to select a meaningful subset of the GWAS single-nucleotide polymorphisms as predictors from their association statistics. However, even efficient scores like Lassosum, when derived from European-based GWASs, are poor predictors of phenotype for subjects of non-European ancestry; that is, they have limited portability to other ancestries. To increase the portability of Lassosum, when GWAS information and estimates of linkage disequilibrium are available for both ancestries, we propose Joint-Lassosum (JLS). In the simulation settings we explore, JLS provides more accurate PGSs compared to other methods, especially when measured in terms of fairness. In analyses of UK Biobank data, JLS was computationally more efficient but slightly less accurate than a Bayesian comparator, SDPRX. Like all PGS methods, JLS requires selection of predictors, which are determined by data-driven tuning parameters. We describe a new approach to selecting tuning parameters and note its relevance for model selection for any PGS. We also draw connections to the literature on algorithmic fairness and discuss how JLS can help mitigate fairness-related harms that might result from the use of PGSs in clinical settings. While no PGS method is likely to be universally portable, due to the diversity of human populations and unequal information content of GWASs for different ancestries, JLS is an effective approach for enhancing portability and reducing predictive bias.


Asunto(s)
Estudio de Asociación del Genoma Completo , Equidad en Salud , Humanos , Teorema de Bayes , Benchmarking , Simulación por Computador
6.
bioRxiv ; 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38045428

RESUMEN

Background: Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. Results: We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. Conclusions: eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.

7.
bioRxiv ; 2023 Sep 24.
Artículo en Inglés | MEDLINE | ID: mdl-37790341

RESUMEN

Polygenic scores (PGS) are quantitative metrics for predicting phenotypic values, such as human height or disease status. Some PGS methods require only summary statistics of a relevant genome-wide association study (GWAS) for their score. One such method is Lassosum, which inherits the model selection advantages of Lasso to select a meaningful subset of the GWAS single nucleotide polymorphisms as predictors from their association statistics. However, even efficient scores like Lassosum, when derived from European-based GWAS, are poor predictors of phenotype for subjects of non-European ancestry; that is, they have limited portability to other ancestries. To increase the portability of Lassosum, when GWAS information and estimates of linkage disequilibrium are available for both ancestries, we propose Joint-Lassosum. In the simulation settings we explore, Joint-Lassosum provides more accurate PGS compared with other methods, especially when measured in terms of fairness. Like all PGS methods, Joint-Lassosum requires selection of predictors, which are determined by data-driven tuning parameters. We describe a new approach to selecting tuning parameters and note its relevance for model selection for any PGS. We also draw connections to the literature on algorithmic fairness and discuss how Joint-Lassosum can help mitigate fairness-related harms that might result from the use of PGS scores in clinical settings. While no PGS method is likely to be universally portable, due to the diversity of human populations and unequal information content of GWAS for different ancestries, Joint-Lassosum is an effective approach for enhancing portability and reducing predictive bias.

8.
ArXiv ; 2023 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-37744467

RESUMEN

Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It begins by disentangling marginal and uncorrelated confounding effects to recover the latent coefficients. Subsequently, latent factors and primary effects are jointly estimated through lasso-type optimization. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish the identification conditions of various effects and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.

10.
Science ; 380(6646): eadh7699, 2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37141313

RESUMEN

Most variants associated with complex traits and diseases identified by genome-wide association studies (GWAS) map to noncoding regions of the genome with unknown effects. Using ancestrally diverse, biobank-scale GWAS data, massively parallel CRISPR screens, and single-cell transcriptomic and proteomic sequencing, we discovered 124 cis-target genes of 91 noncoding blood trait GWAS loci. Using precise variant insertion through base editing, we connected specific variants with gene expression changes. We also identified trans-effect networks of noncoding loci when cis target genes encoded transcription factors or microRNAs. Networks were themselves enriched for GWAS variants and demonstrated polygenic contributions to complex traits. This platform enables massively parallel characterization of the target genes and mechanisms of human noncoding variants in both cis and trans.


Asunto(s)
Enfermedad , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Sitios de Carácter Cuantitativo , Análisis de la Célula Individual , Humanos , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Proteómica , Células Sanguíneas , RNA-Seq , Enfermedad/genética
11.
Biometrika ; 110(2): 339-360, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37197740

RESUMEN

Sparse principal component analysis is an important technique for simultaneous dimensionality reduction and variable selection with high-dimensional data. In this work we combine the unique geometric structure of the sparse principal component analysis problem with recent advances in convex optimization to develop novel gradient-based sparse principal component analysis algorithms. These algorithms enjoy the same global convergence guarantee as the original alternating direction method of multipliers, and can be more efficiently implemented with the rich toolbox developed for gradient methods from the deep learning literature. Most notably, these gradient-based algorithms can be combined with stochastic gradient descent methods to produce efficient online sparse principal component analysis algorithms with provable numerical and statistical performance guarantees. The practical performance and usefulness of the new algorithms are demonstrated in various simulation studies. As an application, we show how the scalability and statistical accuracy of our method enable us to find interesting functional gene groups in high-dimensional RNA sequencing data.

12.
Artículo en Inglés | MEDLINE | ID: mdl-37121399

RESUMEN

BACKGROUND: Integrating multiple neuroimaging modalities to identify clusters of individuals and then associating these clusters with psychopathology is a promising approach for understanding neurobiological mechanisms that underlie psychopathology and the extent to which these features are associated with clinical symptoms. METHODS: We leveraged neuroimaging data from T1-weighted, diffusion-weighted, and resting-state functional magnetic resonance images from the Adolescent Brain Cognitive Development (ABCD) Study (N = 8035) and used similarity network fusion and spectral clustering to identify subgroups of participants. We examined neuroimaging measures as a function of clustering profiles using 1, 2, or 3 imaging modalities (i.e., data combinations), calculated the stability of the clustering assignment in each respective data combination, and compared the consistency of clusters across different data combinations. We then compared the extent to which clusters were associated with overall psychopathology at the baseline assessment and at 2 yearly follow-up visits. RESULTS: Each data combination resulted in optimal clusters ranging from 2 to 4 subgroups for each data combination. Clusters were stable across subsampling of the ABCD Study cohort. Widespread structural measures (surface area, fractional anisotropy, and mean diffusivity) were important features contributing to clustering across different data combinations. Five of the seven data combinations were associated with overall psychopathology, both at baseline and over time (d = 0.08-0.41). Generally, lower global cortical volume and surface area, widespread reduced fractional anisotropy, and increased radial diffusivity were associated with increased overall psychopathology. CONCLUSIONS: Profiles constructed from neuroimaging data combinations are associated with concurrent and future psychopathology trajectories.


Asunto(s)
Encéfalo , Trastornos Mentales , Humanos , Adolescente , Encéfalo/patología , Imagen de Difusión Tensora/métodos , Imagen por Resonancia Magnética/métodos , Neuroimagen
13.
medRxiv ; 2023 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-36945630

RESUMEN

Genomic regulatory elements active in the developing human brain are notably enriched in genetic risk for neuropsychiatric disorders, including autism spectrum disorder (ASD), schizophrenia, and bipolar disorder. However, prioritizing the specific risk genes and candidate molecular mechanisms underlying these genetic enrichments has been hindered by the lack of a single unified large-scale gene regulatory atlas of human brain development. Here, we uniformly process and systematically characterize gene, isoform, and splicing quantitative trait loci (xQTLs) in 672 fetal brain samples from unique subjects across multiple ancestral populations. We identify 15,752 genes harboring a significant xQTL and map 3,739 eQTLs to a specific cellular context. We observe a striking drop in gene expression and splicing heritability as the human brain develops. Isoform-level regulation, particularly in the second trimester, mediates the greatest proportion of heritability across multiple psychiatric GWAS, compared with eQTLs. Via colocalization and TWAS, we prioritize biological mechanisms for ~60% of GWAS loci across five neuropsychiatric disorders, nearly two-fold that observed in the adult brain. Finally, we build a comprehensive set of developmentally regulated gene and isoform co-expression networks capturing unique genetic enrichments across disorders. Together, this work provides a comprehensive view of genetic regulation across human brain development as well as the stage-and cell type-informed mechanistic underpinnings of neuropsychiatric disorders.

14.
Proc Natl Acad Sci U S A ; 119(49): e2214414119, 2022 12 06.
Artículo en Inglés | MEDLINE | ID: mdl-36459654

RESUMEN

Recent advances in single-cell technologies enable joint profiling of multiple omics. These profiles can reveal the complex interplay of different regulatory layers in single cells; still, new challenges arise when integrating datasets with some features shared across experiments and others exclusive to a single source; combining information across these sources is called mosaic integration. The difficulties lie in imputing missing molecular layers to build a self-consistent atlas, finding a common latent space, and transferring learning to new data sources robustly. Existing mosaic integration approaches based on matrix factorization cannot efficiently adapt to nonlinear embeddings for the latent cell space and are not designed for accurate imputation of missing molecular layers. By contrast, we propose a probabilistic variational autoencoder model, scVAEIT, to integrate and impute multimodal datasets with mosaic measurements. A key advance is the use of a missing mask for learning the conditional distribution of unobserved modalities and features, which makes scVAEIT flexible to combine different panels of measurements from multimodal datasets accurately and in an end-to-end manner. Imputing the masked features serves as a supervised learning procedure while preventing overfitting by regularization. Focusing on gene expression, protein abundance, and chromatin accessibility, we validate that scVAEIT robustly imputes the missing modalities and features of cells biologically different from the training data. scVAEIT also adjusts for batch effects while maintaining the biological variation, which provides better latent representations for the integrated datasets. We demonstrate that scVAEIT significantly improves integration and imputation across unseen cell types, different technologies, and different tissues.


Asunto(s)
Modelos Estadísticos , Programas Informáticos , Cromatina , Tecnología
15.
Cell Rep ; 41(5): 111585, 2022 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-36323256

RESUMEN

Posttranscriptional RNA modifications by adenosine-to-inosine (A-to-I) editing are abundant in the brain, yet elucidating functional sites remains challenging. To bridge this gap, we investigate spatiotemporal and genetically regulated A-to-I editing sites across prenatal and postnatal stages of human brain development. More than 10,000 spatiotemporally regulated A-to-I sites were identified that occur predominately in 3' UTRs and introns, as well as 37 sites that recode amino acids in protein coding regions with precise changes in editing levels across development. Hyper-edited transcripts are also enriched in the aging brain and stabilize RNA secondary structures. These features are conserved in murine and non-human primate models of neurodevelopment. Finally, thousands of cis-editing quantitative trait loci (edQTLs) were identified with unique regulatory effects during prenatal and postnatal development. Collectively, this work offers a resolved atlas linking spatiotemporal variation in editing levels to genetic regulatory effects throughout distinct stages of brain maturation.


Asunto(s)
Inosina , Edición de ARN , Humanos , Animales , Ratones , Edición de ARN/genética , Inosina/genética , Adenosina/metabolismo , Primates , Regiones no Traducidas 3' , Encéfalo/metabolismo , Adenosina Desaminasa/metabolismo
16.
Proc Natl Acad Sci U S A ; 119(34): e2205518119, 2022 08 23.
Artículo en Inglés | MEDLINE | ID: mdl-35969737

RESUMEN

Testing the significance of predictors in a regression model is one of the most important topics in statistics. This problem is especially difficult without any parametric assumptions on the data. This paper aims to test the null hypothesis that given confounding variables Z, X does not significantly contribute to the prediction of Y under the model-free setting, where X and Z are possibly high dimensional. We propose a general framework that first fits nonparametric machine learning regression algorithms on [Formula: see text] and [Formula: see text], then compares the prediction power of the two models. The proposed method allows us to leverage the strength of the most powerful regression algorithms developed in the modern machine learning community. The P value for the test can be easily obtained by permutation. In simulations, we find that the proposed method is more powerful compared to existing methods. The proposed method allows us to draw biologically meaningful conclusions from two gene expression data analyses without strong distributional assumptions: 1) testing the prediction power of sequencing RNA for the proteins in cellular indexing of transcriptomes and epitopes by sequencing data and 2) identification of spatially variable genes in spatially resolved transcriptomics data.


Asunto(s)
Genómica , Aprendizaje Automático , Algoritmos , Análisis de Regresión , Transcriptoma
17.
Hum Mol Genet ; 31(3): 481-489, 2022 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-34508597

RESUMEN

The use of external controls in genome-wide association study (GWAS) can significantly increase the size and diversity of the control sample, enabling high-resolution ancestry matching and enhancing the power to detect association signals. However, the aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors and the use of different genotyping platforms. These obstacles have impeded the use of external controls in GWAS and can lead to spurious results if not carefully addressed. We propose a unified data harmonization pipeline that includes an iterative approach to quality control and imputation, implemented before and after merging cohorts and arrays. We apply this harmonization pipeline to aggregate 27 517 European control samples from 16 collections within dbGaP. We leverage these harmonized controls to conduct a GWAS of Crohn's disease. We demonstrate a boost in power over using the cohort samples alone, and that our procedure results in summary statistics free of any significant batch effects. This harmonization pipeline for aggregating genotype data from multiple sources can also serve other applications where individual level genotypes, rather than summary statistics, are required.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Estudios de Cohortes , Genotipo , Humanos , Polimorfismo de Nucleótido Simple/genética , Control de Calidad
18.
Am J Psychiatry ; 179(3): 216-225, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-34789012

RESUMEN

OBJECTIVE: Obsessive-compulsive disorder (OCD) is known to be substantially heritable; however, the contribution of genetic variation across the allele frequency spectrum to this heritability remains uncertain. The authors used two new homogeneous cohorts to estimate the heritability of OCD from inherited genetic variation and contrasted the results with those of previous studies. METHODS: The sample consisted of 2,090 Swedish-born individuals diagnosed with OCD and 4,567 control subjects, all genotyped for common genetic variants, specifically >400,000 single-nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) ≥0.01. Using genotypes of these SNPs to estimate distant familial relationships among individuals, the authors estimated the heritability of OCD, both overall and partitioned according to MAF bins. RESULTS: Narrow-sense heritability of OCD was estimated at 29% (SE=4%). The estimate was robust, varying only modestly under different models. Contrary to an earlier study, however, SNPs with MAF between 0.01 and 0.05 accounted for 10% of heritability, and estimated heritability per MAF bin roughly followed expectations based on a simple model for SNP-based heritability. CONCLUSIONS: These results indicate that common inherited risk variation (MAF ≥0.01) accounts for most of the heritable variation in OCD. SNPs with low MAF contribute meaningfully to the heritability of OCD, and the results are consistent with expectation under the "infinitesimal model" (also referred to as the "polygenic model"), where risk is influenced by a large number of loci across the genome and across MAF bins.


Asunto(s)
Estudio de Asociación del Genoma Completo , Trastorno Obsesivo Compulsivo , Alelos , Estudio de Asociación del Genoma Completo/métodos , Humanos , Herencia Multifactorial , Trastorno Obsesivo Compulsivo/diagnóstico , Trastorno Obsesivo Compulsivo/genética , Polimorfismo de Nucleótido Simple/genética
19.
Proc Natl Acad Sci U S A ; 118(51)2021 12 21.
Artículo en Inglés | MEDLINE | ID: mdl-34903665

RESUMEN

Gene coexpression networks yield critical insights into biological processes, and single-cell RNA sequencing provides an opportunity to target inquiries at the cellular level. However, due to the sparsity and heterogeneity of transcript counts, it is challenging to construct accurate gene networks. We develop an approach, locCSN, that estimates cell-specific networks (CSNs) for each cell, preserving information about cellular heterogeneity that is lost with other approaches. LocCSN is based on a nonparametric investigation of the joint distribution of gene expression; hence it can readily detect nonlinear correlations, and it is more robust to distributional challenges. Although individual CSNs are estimated with considerable noise, average CSNs provide stable estimates of networks, which reveal gene communities better than traditional measures. Additionally, we propose downstream analysis methods using CSNs to utilize more fully the information contained within them. Repeated estimates of gene networks facilitate testing for differences in network structure between cell groups. Notably, with this approach, we can identify differential network genes, which typically do not differ in gene expression, but do differ in terms of the coexpression networks. These genes might help explain the etiology of disease. Finally, to further our understanding of autism spectrum disorder, we examine the evolution of gene networks in fetal brain cells and compare the CSNs of cells sampled from case and control subjects to reveal intriguing patterns in gene coexpression.


Asunto(s)
Encéfalo/citología , Redes Reguladoras de Genes/fisiología , Análisis de Secuencia de ARN , Análisis de la Célula Individual/métodos , Trastorno del Espectro Autista/metabolismo , Feto , Regulación de la Expresión Génica , Humanos , Neuronas , RNA-Seq
20.
Genome Biol ; 22(1): 344, 2021 12 20.
Artículo en Inglés | MEDLINE | ID: mdl-34930414

RESUMEN

Single-cell CRISPR screens are a promising biotechnology for mapping regulatory elements to target genes at genome-wide scale. However, technical factors like sequencing depth impact not only expression measurement but also perturbation detection, creating a confounding effect. We demonstrate on two single-cell CRISPR screens how these challenges cause calibration issues. We propose SCEPTRE: analysis of single-cell perturbation screens via conditional resampling, which infers associations between perturbations and expression by resampling the former according to a working model for perturbation detection probability in each cell. SCEPTRE demonstrates very good calibration and sensitivity on CRISPR screen data, yielding hundreds of new regulatory relationships supported by orthogonal biological evidence.


Asunto(s)
Sistemas CRISPR-Cas , Genoma Humano , Análisis de la Célula Individual/métodos , Calibración , Secuenciación de Inmunoprecipitación de Cromatina/métodos , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Edición Génica , Expresión Génica , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...